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Abstract 

In order to accelerate molecular dynamics simulations, it is very com- 
mon to impose holonomic constraints on their hardest degrees of freedom. 
In this way, the time step used to integrate the equations of motion can be 
increased, thus allowing, in principle, to reach longer total simulation times. 
The imposition of such constraints results in an aditional set of Nc equations 
(the equations of constraint) and unknowns (their associated Lagrange mul- 
tipliers), that must be solved in one way or another at each time step of the 
dynamics. In this work it is shown that, due to the essentially linear structure 
of typical biological polymers, such as nucleic acids or proteins, the alge- 
braic equations that need to be solved involve a matrix which is banded if the 
constraints are indexed in a clever way. This allows to obtain the Lagrange 
multipliers through a non-iterative procedure, which can be considered exact 
up to machine precision, and which takes 0{N^) operations, instead of the 
usual 0{NI) for generic molecular systems. We develop the formalism, and 
describe the appropriate indexing for a number of model molecules and also 
for alkanes, proteins and DNA. Finally, we provide a numerical example of 
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the technique in a series of polyalanine peptides of different lengths using 
the AMBER molecular dynamics package. 

Keywords: constraints, Lagrange multipliers, banded systems, molecular 
dynamics, proteins, DNA 



1 Introduction 

Due to the high frequency of the fastest internal motions in molecular systems, 
the discrete time step for molecular dynamics simulations must be very small (of 
the order of femtoseconds), while the actual span of biochemical proceses typi- 
cally require the choice of relatively long total times for simulations (e.g., from 
microseconds to milliseconds for protein folding processes). In addition to this, 
since biologically interesting molecules (like proteins [HI and DNA [2]) consist of 
thousands of atoms, their trajectories in configuration space are esentially chaotic, 
and therefore reliable quantities can be obtained from the simulation only after 
statistical analysis [3]. In order to cope with these two requirements, which force 
the computation of a large number of dynamical steps if predictions want to be 
made, great efforts are being done both in hardware [31 |5l and in software [0 [7][ 
solutions. In fact, only in very recent times, simulations for interesting systems of 
hundreds of thousand of atoms in the millisecond scale are starting to become af- 
fordable, being still, as we mentioned, the main limitation of these computational 
techniques the large difference between the elemental time step used to integrate 
the equations of motion and the total time span needed to obtain useful informa- 
tion. In this context, strategies to increase the time step are very valuable. 

A widely used method to this end is to constrain some of the internal degrees 
of freedom [8J of a molecule (typically bond lengths, sometimes bond angles and 
rarely dihedral angles. For a Verlet-like integrator [[9l [lOl, stability requires the 
time step to be at least about five times smaller than the period of the fastest 
vibration in the studied system [11]. Here is where constraints come into play. By 
constraining the hardest degrees of freedom, the fastest vibrational motions are 
frozen, and thus larger time steps still produce stable simulations. If constraints 
are imposed on bond lengths involving hydrogens, the time step can typically be 
increased by a factor of 2 to 3 (from 1 fs to 2 or 3 fs) [[T2l . Constraining additional 
internal degrees of freedom, such as heavy atoms bond lengths and bond angles, 
allows even larger timesteps [TTiT3[, but one has to be careful, since, as more and 
softer degrees of freedom are constrained, the more likely it is that the physical 
properties of the simulated system could be severely distorted [fT4] - fT6l . 

The essential ingredient in the calculation of the forces produced by the impo- 
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sition of constraints are the so-called Lagrange multipliers [17], and their efficient 
numerical evaluation is therefore of the utmost importance. In this work, we show 
that the fact that many interesting biological molecules are esentially linear poly- 
mers allows to calculate the Lagrange multipliers in order A^^ operations (for a 
molecule where Nc constraints are imposed) in an exact (up to machine preci- 
sion), non-iterative way. Moreover, we provide a method to do so which is based 
in a clever ordering of the constraints indices, and in a recently introduced al- 
gorithm for solving linear banded systems [|T8l . It is worth mentioning that, in 
the specialized literature, this possibility has not been considered as far as we are 
aware; with some works commenting that solving this kind of linear problems 
(or related ones) is costly (but not giving further details) |fT9l - l2T]| . and some other 
works explicitly stating that such a computation must take 0(N^) [|22l or 0{N^) 
[|23ll24l operations. Also, in the field of robot kinematics, many O(A^c) algorithms 
have been devised to deal with different aspects of constrained physical systems 
(robots in this case) [I25l-l27]|. but none of them tackles the calculation of the La- 
grange multipliers themselves. 

This work is structured as follows. In sec. |2| we introduce the basic formal- 
ism for the calculation of constraint forces and Lagrange multipliers. In sec. [3j 
we explain how to index the constraints in order for the resulting linear system of 
equations to be banded with the minimal bandwidth (which is essential to solve 
it efficiently). We do this starting by very simple toy systems and building on 
complexity as we move forward towards the final discussion about DNA and pro- 
teins; this way of proceeding is intended to help the reader build the corresponding 
indexing for molecules not covered in this work. In sec. |4} we apply the intro- 
duced technique to a polyalanine peptide using the AMBER molecular dynamics 
package and we compare the relative efficiency between the calculation of the La- 
grange multipliers in the traditional way (0(Nf.)) and in the new way presented 
here {0{Nc)). Finally, in sec. [5} we summarize the main conclusions of this work 
and outline some possible future applications. 



2 Calculation of the Lagrange multipliers 

If holonomic, rheonomous constraints are imposed on a classical system of n 
atoms, and the D'Alembert's principle is assumed to hold, its motion is the so- 
lution of the following system of differential equations [[T7ll28ll : 

= ^aix(t)) + V Ai(t)V,yix(t)) , a=l,...,n, (2.1a) 

a\x{t)) = 0, I=h...,N,, (2.1b) 
x(to) = xo , (2. Ic) 
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dt 



Xo 



(2. Id) 



where ( |2.1a| ) is the modified Newton's second law and ( |2.1b| ) are the equations 
of the constraints themselves; Aj are the Lagrange multipliers associated with the 
constraints; F„ represents the external force acting on atom a, Xa is its Euclidean 
position, and x colectively denote the set of all such coordinates. We assume ¥„ 
to be conservative, i.e., to come from the gradient of a scalar potential function 
V{x)\ and YIIU ^i^acr'ix) should be regarded as the force of constraint acting on 
atom a. 

Also, in the above expression and in this whole document we will use the 
following notation for the different indices: 



• a,/3,y.6,^ = I, . . . ,n (except if otherwise stated) for atoms. 

• fi,v = I, . . . ,3n (except if otherwise stated) for the atoms coordinates when 
no explicit reference to the atom index needs to be made. 

• I,J= I, ... ,Nc for constrains and the rows and columns of the associated 
matrices. 

• k,las generic indices for products and sums. 



The existence of Nc constraints turns a system ofN = 3n differential equations 
with N unknowns into a system of N + A^, algebraic-differential equations with 



N + Nc unknowns. The constraints equations in (2.1b I are the new equations, and 
the Lagrange multipliers are the new unknowns whose value must be found in 
order to solve the system. 



If the functions (t'{x) are analytical, the system of equations in (7A) is equiv- 
alent to the following one: 



d^Xa(t) 
'"^ dt- 


= F,(x(0) + J] A,{t)Vc,(T'ix(t)) , 

7=1 


(2.2a) 


o-\x(to)) 


= 0, 


(2.2b) 


da'ixito)) 
dt 


= 0, 


(2.2c) 


dV^(x(0) 
dt- 


= , v?, 


(2.2d) 


xito) 




(2.2e) 


dxito) 
dt 


= -^0 ■ 


(2.2f) 
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In this new form, it exists a more direct path to solve for the Lagrange mul- 



tipliers: If we explicitly calculate the second derivative in eq. (2.2d) and then 
substitute eq. ( |2.2a[ ) where the accelerations appear, we arrive to 



dV^ 



Zl f ;> ^ + V' 



dxi" ^ dt dt dxi'dx" 



dxi" 

=: p' + q' + YjRuAj = 0, I=l,...,N,, (2.3) 

where we have implicitly defined 

, 1 da' I ^ ^ I 

p' := y —P,^ = E — ^« ■ ' (2.4a) 
m,, oXu ^—^ nta 

J ^ dx^ dx^ d^cr' 

i?// := y — 7— -T- = y — VX • V„(r^ , (2.4c) 

and it becomes clear that, at each t, the Lagrange multipliers Aj are actually a 
known function of the positions and the velocities. 
We shall use the shorthand 

o':=p' + q', I=\,...,N,, (2.5) 

and, o, p, and q to denote the whole A^^-tuples, as usual. 

Now, in order to obtain the Lagrange multiphers we just need to solve 

J]RuAj = -(p' + q') ^ RA = -o. (2.6) 

This is a linear system of A^^. equations and A^^ unknowns. In the following, 
we will prove that the solution to it, when constraints are imposed on typical 
biological polymers, can be found in 0{Nc) operations without the use of any 
iterative or truncation procedure, i.e., in an exact way up to machine precision. To 
show this, first, we will prove that the value of the vectors p and q can be obtained 
in 0{Nc^ operations. Then, we will show that the same is true for all the non-zero 
entries of matrix R, and finally we will briefly discuss the results in [18], where 



we introduced an algorithm to solve the system in (2.6) also in 0{Nc) operations. 

It is worth remarking at this point that, in this work, we will only consider 
constraints that hold the distance between pairs of atoms constant, i.e.. 



a"^'''''\x) := 14 - Xf,\' - {a^j^y , (2.7) 
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where is a constant number, and the fact that we can establish a correspon- 
dence between constrained pairs {a,P) and the constraints indices has been ex- 
plicitly indicated by the notation I{a,/3). 
This can represent a constraint on: 

• a bond length between atoms a and /3, 

• a bond angle between atoms a, /3 and j, if both a and /3 are connected to y 
through constrained bond lengths, 

• a principal dihedral angle involving a, /3, y and 6 (see [29| for a rigorous 
definition of the different types of internal coordinates), if the bond lengths 
{a,/3), (J3,y) and (y,S) are constrained, as well as the bond angles (a,/3,y) 
and (J3, y, S), 

• or a phase dihedral angle involving a, /3, y and 6 if the bond lengths (a,/3), 
(J3,y) and (J3,S) axe constrained, as well as the bond angles (a,/3,y) and 
(a,j3,6). 

This way to constrain degrees of freedom is called triangularization. If no 
triangularization is desired (as, for example, if we want to constrain dihedral an- 
gles but not bond angles), different explicit expressions than those in the follow- 
ing paragraphs must be written down, but the basic concepts introduced here are 
equally valid and the main conclusions still hold. 

Now, from eq. (2.7), we obtain 

Vycr'^"^^^ = 2(4 - xp){6y,a - 5y,p) . (2.8) 



Inserting this into (2.4a), we get a simple expression for p'^"fi'> 
P 



w) ._ y^F,^ = yJ-F;.vx<-/^) (2.9) 

m.. nr.. ^—^ m.. 



^ dXf, ^niy 



^ ■ {^a — ^li){6y^a ~ Sy^p) = 2(i^ - X/j) 



y y 



nia nip 



The calculation of q'^'^'P^ is more involved, but it also results into a simple 
expression: First, we remember that the indices run as /i, v = 1, . . . , 3«, and a = 
1, . . . , n, and we produce the following trivial relationship: 



4 = x^^-^l + x^^-^j + x^^k 
dXa _ d{x^"-H + x^"-^] + x^"k) 
dxi" ~ dxf 



= Sia-lJ + 53a-l,,j] + S3a,iik (2.10) 
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where /, j and k are the unitary vectors along the x, y and z axes, respectively. 



Therefore, much related to eq. (2.8), we can compute the first derivative of 



cr 



(2.11) 



and also the second derivative: 



dx^dx" 



^ ^ ^ ^ ^ ^ 

■ [(33a-2,vi + ^Sa-l.vj + Ssa.yk) - {S^p-lJ + ^S/i-l.vj + S3)3,yk)] 
= '2{63a-2,fi53a-2,v + ^3yS-2,^^3/3-2,v - ^3Q'-2,/i^3/3-2,v - ^3/3-2,/j(^3a-2,y 
+ (^3a-l,p^3a-l,y + <^3y3-l,/J^3/3-l,v ~ ^3q-1,/j^3;S-1,v ~ ^3/S-l,^i^3Q'-l,v 
+ Ssa,fiS3a,v + 53/3,^i^3/3,v - ^3(}-,//^3/?,v - ^3/3,M^3a.v) ■ (2.12) 



Taking this into the original expression for q'^"-f^'> in eq. (2.4b I and playing with 
the sums and the deltas, we arrive to 



E 



dx^' dx" d^o-'^"'^^ 
dt dt ^x^'^x'' 



= 2 



dx 



3a-2 



+ 2 



dt 



+ 2 

2 



dx^f'- 



dt 

.3q-\2 



+ 2 



dt 
dx^ 
dt 



dt 



dt dt 

dx^"'^ dx^^"' 
dt dt 

dx^"dx^f^^ 



dt dt 



= 2 



dXa 

"dT 



dx^ 
dt 



(2.13) 



Now, eqs. (2.5 1, (2.9 1 and (2.13) can be gathered together to become 



dxa dX/s 
dt dt 



+ 2(i^ - X/s) 



(2.14) 



where we can see that the calculation of o"'"'^'' takes always the same number 
of operations, independently of the number of atoms in our system, n, and the 
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number of constraints imposed on it, A^^.. Therefore, calculating the whole vector 



o in eq. (2.6) scales like A^c- 

In order to obtain an explicit expression for the entries of the matrix R, we 



now introduce eq. (2.8) into its definition in eq. (2.4c): 



R 



l(a,li)J(y,e) 



^=1 < 



./(7,e) 



= (-4 - Xfs) ■ (Xy - X,)(6i;^a - 5^,/3)(^Cy " ^C.f) 

= 4(f„-f,).(f,-f.)(^-^-^ + ^), (2.15) 

\ma nia nip nip) 



where we have used that 



(2.16) 



Looking at this expression, we can see that a constant number of operations 
(independent of n and A^,.) is required to obtain the value of every entry in R. The 
terms proportional to the Kroenecker deltas imply that, as we will see later, in 
a typical biological polymer, the matrix R will be sparse (actually banded if the 
constraints are appropriately ordered as we describe in the following sections), 
being the number of non-zero entries actually proportional to A'^^. More precisely, 
the entry Ru will only be non-zero if the constraints / and J share an atom. 



Now, since both the vector o and the matrix R in eq. (2.6) can be computed in 
0{Nc) operations, it only remains to be proved that the solution of the linear sys- 
tem of equations is also an 0{Nc) process, but this is a well-known fact when the 
matrix defining the system is banded. In [18J, we introduced a new algorithm to 
solve this kind of banded systems which is faster and more accurate than existing 
alternatives. Essentially, we shown that the linear system of equations 



Ax = b 



(2.17) 



where A is a J x J matrix, x is the J x 1 vector of the unknowns, is a given J x 1 
vector and A is banded, i.e., it satisfies that for known m < n 



Aij^K = \IK>m,\II, 
Aj^Lj = ML>m,\II, 



(2.18) 
(2.19) 



can be directly solved up to machine precision in 0{d) operations. 
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This can be done using the following set of recursive equations for the auxil- 
iary quantities ^jj (see IfTSlI for details): 



^11 = 



i-i 



^11 



M=max(\,I—m) 
/-I 



IJ 



Cl 



^IJ - ^11 -Mj + ^IM^MJ 
M=max{ 1,7-m 
7-1 

M=max{ l,/-m| 
7-1 

bi + ^ ^imCm , 

M=max{7-m,l} 
min{7+m,n) 

•^7 = ^77^7 + ^ ^IKXk ■ 

K=I+1 



for I <J 



for I> J 



(2.20a) 
(2.20b) 
(2.20c) 
(2.20d) 
(2.20e) 



If the matrix A is symmetric (A// = Aj/), as it is the case with R [see (2.4c)], 
we can additionally save about one half of the computation time just by using 



^77 - ^Jl/^JJ 



for I > J , 



(2.21) 



instead of ( 2.20cp . Eq. (2.21 ) can be obtained from ( 2.20p by induction, and we 
recommend these expressions for the ^ coefficients because other valid ones (like 

considering = ^jj, ^// = 1 / ^Au - T,Ml,nax{\,i-m) ^im^mi, which involves square 
roots) are computationally more expensive. 

In the next sections, we show how to index the constraints in such a way 
that nearby indices correspond to constraints where involved atoms are close to 
each other and likely participate of the same constraints. In such a case, not only 
will the matrix R in eq. (2.6) be banded, allowing to use the method described 
above, but it will also have a minimal bandwidth m, which is also an important 
point, since the computational cost for solving the linear system scales as 0{Ncin^) 
(when the bandwidth is constant). 



3 Ordering of the constraints 

In this section we describe how to index the constraints applied to the bond lengths 
and bond angles of a series of model systems and biological molecules with the 
already mentioned aim of minimizing the computational cost associated to the 
obtention of the Lagrange multipliers. The presentation begins by deliberately 
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simple systems and proceeds to increasingly more complicated molecules with 
the intention that the reader is not only able to use the final results presented here, 
but also to devise appropriate indexings for diff'erent molecules not covered in this 
work. 

The main idea we have to take into account, as expressed in section [2[ is 
to use nearby numbers to index constraints containing the same atoms. If we 
do so, we will obtain banded R matrices. Further computational savings can be 



obtained if we are able to reduce the number of ^ coeflicients in eqs. (2.20) to 



be calculated. In more detail, solving a linear system like (2.6) where the R is 
Nc X Nc and banded with semi-band width (i.e., the number of non-zero entries 
neighbouring the diagonal in one row or column) m requires O(Ncm^) operations 
if m is a constant. Therefore, the lower the value of m, the smaller the number 
of required numerical eff"ort. When the semi-band width m is not constant along 
the whole matrix, things are more complicated and the cost is always between 
OiNc-m^^^) and 0(Ncin^^J, depending on how the different rows are arranged. 
In general, we want to minimize the number of zero fillings in the process of 
Gaussian elimination (see IfTSl for further details), which is achieved by not having 
zeros below non-zero entries. 

This is easier to understand with an example: Consider the following matrices, 
where Q. and co represent different non-zero values for every entry (i.e., not all co, 
nor all Q. must take the same value, and different symbols have been chosen only 
to highlight the main diagonal): 

I n io oj oj ... \ „ ( n 0) oj ... \ 
^■=\ Q ... ' a CO ... ■ ^^-'^ 



During the Gaussian elimination process that is behind (2.20), in A, five coef- 
ficients ^ above the diagonal are to be calculated, three in the first row and two in 
the second one, because the entries below non-zero entries become non-zero too 
as the elimination process advances (this is what we have called 'zero filling'). 
On the other hand, in B, which contains the same number of non-zero entries as 
A, only three coefficients ^ have to be calculated: two in the first row and one in 
the second row. Whether R looks like A or like B depends on our choice of the 
constraints ordering. 

One has also to take into account that no increase in the computational cost 
occurs if a series of non-zero columns is separated from the diagonal by columns 
containing all zeros. I.e., the linear systems associated to the following two ma- 
trices require the same numerical effort to be solved: 

'^Cla)coOOa)coO...\ ( Q co co oj co . . . 



C := 



Q CO CO CO 
a to CO 



D : = 



Q. CO CO CO 
Q. CO CO 



(3.2) 
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3.1 Open, single-branch chain with constrained bond lengths 



As promised, we start by a simple model of a biomolecule: an open linear chain 
without any branch. In this case, the atoms should be trivially numbered as in 



fig. 3.1 (any other arrangement would have to be justified indeed!). 

If we only constrain bond lengths, the fact that only consecutive atoms par- 
ticipate of the same constraints allows us to simplify the notation with respect to 



eq. (2.7 1 and establish the following ordering for the constraints indices: 

1(a) = a , I=\,...,n-\, (3.3) 

with 

(r'^"\xc„ Xa+i) := (Xa - Xa+lf - (a^^a+if = . (3.4) 

This choice results in a tridiagonal matrix R, whose only non-zero entries are 
those lying in the diagonal and its first neighbours. This is the only case for which 
an exact calculation of the Lagrange multipliers exists in the literature as far as we 
are aware [.30,1. 



3.2 Open, single-branch chain with constrained bond lengths 
and bond angles 

The next step in complexity is to constrain the bond angles of the same linear 
chain that we discussed above. The atoms are ordered in the same way, as in fig. 
|3.1[ and the trick to generate a banded matrix R with minimal bandwidth is to 
alternatively index bond length constraints with odd numbers, 

1(a) = 2a - I = 1,3,5, ... ,2n - 3 , with a = 1,2, . . . ,n - I , (3.5) 

and bond angle constraints with even ones, 

J(J3) = 2/3 = 2,4,6,..., 2n- 4, with /3 = 1,2, . . . ,n - 2 , (3.6) 

where the regular pattern involving the atom indicies that participate of the same 
constraints has allowed again to use a lighter notation. 




Figure 3.1: Numbering of the atoms in an open, single-branch chain. 
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a+3 




3a,a+2 Q+2 



Figure 3.2: Segment of a single-branch chain with constrained bond lengths and 
bond angles. In solid line, the distances that have to be kept constant to constrain 
the former; in dashed line, those that have to be kept constant to constrain the 
latter. 

The constraints equations in this case are 

O-^^^KXa, Xa+\) = (Xa - X^+l)^ - (a^.a+l)^ = , (3.7a) 

(T'^\^f,, = (4? - ■^/3+2)' - (a/sji^if = , (3.7b) 

respectively, and, if this indexing is used, i? is a banded matrix where m is 3 and 4 
in consecutive rows and columns. Therefore, the mean (m) is 3.5, and the number 
of ^ coefficients that have to be computed per row in the Gaussian elimination 
process is the same because the matrix contains no zeros that are filled. 

A further feature of this system (and other systems where both bond lengths 
and bond angles are constrained) can be taken into account in order to reduce 
the computational cost of calculating Lagrange multipliers in a molecular dynam- 
ics simulation: A segment of the linear chain with constrained bond lengths and 



bond angles is represented in fig. 3.2, where the dashed lines correspond to the 
virtual bonds between atoms that, when kept constant, implement the constraints 
on bond angles (assuming that the bond lengths, depicted as solid lines, are also 
constrained). 

Due to the fact that all these distances are constant, many of the entries of R 
will remain unchanged during the molecular dynamics simulation. As an example, 
we can calculate 

R2a-\,2a-2 = — (jc„ - x^+i) ■ (jc„ - Xa+i) = — ' — COS lia + l,a,a + 2) 



2m^ 

where we have used the law of cosines. The right-hand side does not depend on 
any time-varying objects (such as Xa), being made of only constant quantities. 
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Figure 3.3: Numbering of the atoms in a minimally branched molecule. 

Therefore, the value of R2a-\,2a-2 (and many other entries) needs not to be recal- 
culated in every time step, which allows to save computation time in a molecular 
dynamics simulation. 

3.3 Minimally branched molecules with constrained bond lengths 

In order to incrementally complicate the calculations, we now turn to a linear 
molecule with only one atom connected to the backbone, such the one displayed 



in figure 3.3 



The corresponding equations of constraint and the ordering in the indices that 
minimizes the bandwidth of the linear system are 



'1,2 



= 0, 



cr^^'^^ = (f,-f,+i)^-<_„^i=0, 1(a) 



a 



2,4,6, ...,n-2. 



0, J(J3)=/3+l=3,5,l,...,n-l 



(3.9a) 
(3.9b) 
(3.9c) 



where the trick this time has been to alternatively consider atoms in the backbone 
and atoms in the branches as we proceed along the chain. 

The matrix R of this molecule presents a semi-band width which is alterna- 
tively 2 and 1 in consecutive rows/columns, with average (m) = 1.5 and the same 
number of superdiagonal f coefficients to be computed per row. 



3.4 Alkanes with constrained bond lengths 

The next molecular topology we will consider is that of an alkane (a familiy of 
molecules with a long tradition in the field of constraints [|T9l), i.e., a linear back- 
bone with two 1-atom branches attached to each site (see fig. |3.4[ ). 

The ordering of the constraints that minimizes the bandwidth of the linear 
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system for this case is 



(r\x) = (xi - xjf - j = , (3.10a) 

cr'^''\x) = (4 - 4v+3)' - ^^Ls,. = ' = a = 2, 5, 8, . . . , n - 3 , (3. 10b) 

cr'^\x) = i^fs - x/s^i)^ - 4^,^^ = , J(j3)=/3+\=3,6,9,...,n-2, (3.10c) 

cr^'^^\x) = (fy - f^+2)' - a'+2,y = ' K{y) = y + 2 = 4, 7, 10, . . . , n - 1 , 

(3.10d) 

where the trick has been in this case to alternatively constrain the bond lengths in 
the backbone and those connecting the branching atoms to one side or the other. 
The resulting R matrix require the calculation of 2 ^ coefficients per row when 
solving the linear system. 



3.5 Minimally branched molecules with constrained bond lengths 
and bond angles 

If we want to additionally constrain bond angles in a molecule with the topology 



in fig. 3.3 the following ordering is convenient: 



cr\x) = (xi - X2)^ - a] 2 = , 



cr^(x) = (X2 - xjf - a? 3 = , 



*2,3 
\2 „2 



cr (x) = (xi - X4) - 4 = , 



(3.11a) 
(3.11b) 
(3.11c) 

cr''">ix) = (f„ - 4+1)' ' = 2a - 2 = 4, 8, 12, . . . , 2n - 4 , 

(3. lid) 

0, 7(je) = 2y8 + 2 = 5,9, 13,...,2n-3, 

(3.11e) 

0, ^(7) = 2y-2 = 6, 10, 14,...,2n-6, 

(3.1 If) 



cr'^\x) = {^„-^^^2f-al 



'lifi+2 



(r^(^^(;c) = (£,-£,+1)2-^2 





Figure 3.4: Numbering of the atoms in a model alkane chain. 



14 



o-'^^'\x) = (f, - f,+4)' - = ' Li^) = 25 + 3 = 7, 1 1, 15, . . . , 2n - 5 . 

(3.11g) 

This ordering produces 16 non-zero entries above the diagonal per each group 
of 4 rows in the matrix R when making the calculations to solve the associated 
linear system. This is, we will have to calculate a mean of 16/4 = 4 super-diagonal 
coefficients ^ per row. When we studied the linear molecule with constrained 
bond lengths and bond angles, this mean was equal to 3.5, so including minimal 
branches in the linear chain makes the calculations just slightly longer. 



3.6 Alkanes with constrained bond lengths and bond angles 

If we now want to add bond angle constraints to the bond length ones described 



3.4 for alkanes, the following ordering produces a matrix R with a low 



m sec. 



half -band width: 



cr\x) = (X2 - fif - ^2 = , (3.12a) 

cr^ix) = {xj - X2f - = , (3.12b) 

o^{x) = (m - - a2,4 = , (3.12c) 

a\x) = (fj -xif- a?,5 = , (3.1 2d) 

cr\x) = i^5-X3f-al,=0, (3.12e) 

(r\x) = (fj - m)^ - ^4,5 = , (3. 12f) 

cr'^''\x) = (f„ - 4+3)' - = ' ^((^) = 2a + 3 = 7, 13, 19, . . . , 2n - 3 , 

(3.12g) 

cr'^\x) = (j?s - ■?/?+!)' - 4/^+1 = , yce) = 2;8 - 2 = 8, 14, 20, . . . , 2n - 8 , 

(3.12h) 

cr^^^Kx) = (f^ - f^+2)' - a;,^+2 = ' K(y) = 2r - 1 = 9, 15, 21, . . . , 2n - 7 , 

(3.12i) 

cr^'^'\x) = (4- - 4>6)' - a^,+6 = , m = 25 + 6 = 10, 16, 22, . . . , 2n - 6 , 

(3.12j) 

cr^(^)(x) = {x, - x,+2f - a^,+2 = ' Mie) = 2e - 1 = 11, 17, 23, . . . , 2n - 5 , 

(3.12k) 

(r^(^)(x) = (ff - ff+i)2 - a^^^i = , A^(^) = 2^ - 2 = 12, 18, 24, . . . , 2n - 4 . 

(3.121) 

In this case, the average number of ^ coefficients to be calculated per row is 
approximately 5.7. 
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Figure 3.5: Numbering of the atoms for cyclic molecules: a) without branches; b) 
minimally branched. 



3.7 Cyclic chains 

If we have cycles in our molecules, the indexing of the constraints is only slightly 
modified with respect to the open cases in the previous sections. For example, if 
we have a single-branch cyclic topology, such as the one displayed in fig. |3.5^ , 
the ordering of the constraints is the following: 

(r^(")(x) = (f,-f,+i)2-(a„,,+i)' = 0, /(a)= l,...,n-l =a, (3.13a) 
cr"W = (fi - Xnf - (ai,„)2 = . (3.13b) 

These equations are the same as those in |3.1[ plus a final constraint corre- 
sponding to the bond which closes the ring. These constraints produce a matrix R 
where only the diagonal entries, its first neighbours, and the entries in the comers 



(i?i,„ and 7?„j) are non-zero. In this case, the associated linear system in eq. (2.6) 



can also be solved in 0{Nc) operations, as we discuss in IfTSll . In general, this is 
also valid whenever i? is a sparse matrix with only a few non-zero entries outside 
of its band, and therefore we can apply the technique introduced in this work to 
molecular topologies containing more than one cycle. 

The ordering of the constraints and the resulting linear systems for different 
cyclic species, such as the one depicted in fig. 3.5\ i, can be easily constructed by 
the reader using the same basic ideas. 



3.8 Proteins 

As we discussed in sec. [T| proteins are one of the most important families of 
molecules from the biological point of view: Proteins are the nanomachines that 
perform most of the complex tasks that need to be done in living organisms, and 
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a+4'ra+T+3 



Qt 




a+3 
d+1 a+T+5 




Figure 3.6: Scheme of the residue of a protein, a) Numbering of the atoms; a 
represents the first numbered atom in each residue (the amino nitrogen) and T is 
the number of atoms in the side chain, b) Indexing of the bond length constraints; 
/ denotes the index of the first constraint imposed on the residue (the N-H bond 
length) and T' is the variable number of constraints imposed on the side chain. 



therefore it is not surprising that they are involved, in one way or another, in 
most of the diseases that affect animals and human beings. Given the efficiency 
and precision with which proteins carry out their missions, they are also being 
explored from the technological point of view. The applications of proteins even 
outside the biological realm are many if we could harness their power [IJ, and 
molecular dynamics simulations of great complexity and scale are being done in 
many laboratories around the world as a tool to understand them [|4l [3ni32]| . 

Proteins present two topological features that simplify the calculation of the 
Lagrange multipliers associated to constraints imposed on their degrees of free- 
dom: 

• They are linear polymers, consisting of a backbone with short (17 atoms 
at most) groups attached to it yj. This produces a banded matrix R, thus 
allowing the solution of the associated linear problem in 0(Nc) operations. 
Even in the case that disulfide bridges, or any other covalent linkage that 
disrupts the linear topology of the molecule, exist, the solution of the prob- 



lem can still be found efficiently if we recall the ideas discussed in sec. 3.7 



The monomers that typically make up these biological polymers, i.e., the 
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residues associated to the proteinogenic aminoacids, are only 20 different 
molecular structures. Therefore, it is convenient to write down explicitly 
one block of the R matrix for each known monomer, and to build the R 
matrix of any protein simply joining together the precalculated blocks asso- 
ciated to the corresponding residues the protein consists of. 

The structure of a segment of the backbone of a protein chain is depicted 
The green spheres represent the side chains, which are the part of 



in fig. 3.6 



the amino acid residue that can differ from one monomer to the next, and which 
usually consist of several atoms: from 1 atom in the case of glycine to 17 in 



arginine or tryptophan. In fig. 3.6 1, we present the numbering of the atoms, which 



will support the ordering of the constraints, and, in fig. 3.6?, the indexing of the 
constraints is presented for the case in which only bond lengths are constrained 
(the bond lengths plus bond angles case is left as an exercise for the reader). 

Using the same ideas and notation as in the previous sections and denoting by 
Rm the block of the matrix R that corresponds to a given amino acid residue M, 
with M = 1, . . . , Nn, we have that, for the monomer dettached of the rest of the 
chain. 



R 



M 



Q 


CO 












CO 


Q. 


CO 


CO 


CO 








CO 


Q 


CO 


CO 








CO 


OJ 




CO 












s 










CO 


OJ 


CO 


n 


OJ 


CO 










CO 


Q 


CO 










CO 


OJ 





(3.14) 



where the explicit non-zero entries are related to the constraints imposed on the 
backbone and S denotes a block associated to those imposed on the bonds that 
belong to the different sidechains. The dimension of this matrix is T' -I- 6 and the 
maximum possible semi -band width is 12 for the bulkiest residues. 

A protein's global matrix R has to be built by joining together blocks like the 
one above, and adding the non-zero elements related to the imposition of con- 
straints on bond lengths that connect one residue with the next. These extra el- 
ements are denoted by coc and a general scheme of the final matrix is shown in 



fig. 3.7 



The white regions in this scheme correspond to zero entries, and we can easily 
check that the matrix is banded. In fact, if each one of the diagonal blocks is con- 
structed conveniently, they will contain many zeros themselves and the bandwidth 
can be reduced further. The size of the ojc blocks will usually be much smaller 
than that of their neighbour diagonal blocks. For example, in the discussed case 
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in which we constrain all bond lengths, o^c are 1 x 2 (or 2 x 1) blocks, and the 
diagonal blocks size is between 7x7 (glycine) and 25 x 25 (tryptophan). 

3.9 Nucleic acids 

Nucleic acids are another family of very important biological molecules that can 
be tackled with the techniques described in this work. DNA and RNA, the two 
subfamilies of nucleic acids, consist of linear chains made up of a finite set of 
monomers (called 'bases'). This means that they share with proteins the two fea- 
tures mentioned in the previous section and therefore the Lagrange multipliers 
associated to the imposition of constraints on their degrees of freedom can be effi- 
ciently computed using the same ideas. It is worth mentioning that DNA typically 
appears in the form of two complementary chains whose bases form hydrogen- 
bonds. Since these bonds are much weaker than a covalent bond, imposing bond 



length constraints on them such as the ones in eq. (2.7) would be too unrealistic 
for many practical purposes. 



In fig. 3.8 , and following the same ideas as in the previous section, we propose 
a way to index the bond length constraints of a DNA strand which produces a 
banded matrix R of low bandwidth. Green spheres represent the (many-atom) 
bases (A, C, T or G), and the general path to be followed for consecutive constraint 
indices is depicted in the upper left corner: first the sugar ring, then the base and 





Figure 3.7: Scheme of the matrix R for a protein molecule with N'n residues. In 
black, we represent the potentially non-zero entries, and each large block in the 



diagonal is given by (2.4c). 
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finally the rest of the nucleotide, before proceeding to the next one in the chain. 

This ordering translates into the following form for the block of R correspond- 
ing to one single nucleotide dettached from the rest of the chain: 





M 


Rm 
M 


Rm 
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Rm - 


Rm 
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s 






Rm 

M 
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Rm 
M 



) 



where 5 is the block associated to the constraints imposed on the bonds that are 
contained in the base, Rj^ , Rf^ , Rf^ , and R^ are very sparse rectangular blocks 
with only a few non-zero entries in them, and the form of the diagonal blocks 




Figure 3.8: Constraints indexing of a DNA nucleotide a) General order to be fol- 
lowed, b) Indexing of the bond length constraints; / denotes the index of the first 
constraint imposed on the nucleotide and T is the variable number of constraints 
imposed on the bonds in the base. 
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associated to the sugar ring and backbone constraints is the following: 
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(3.16a) 



(3.16b) 



Analagously to the case of proteins, as many blocks as those in eq. (3.15) as 
nucleotides contains a given DNA strand have to be joined to produce the global 
matrix R of the whole molecule, together with the toc blocks associated to the 



constraints on the bonds that connect the different monomers. In fig. 3.9 a scheme 
of this global matrix is depicted and we can appreciate that it indeed banded. The 
construction of the matrix R for a RNA molecule should follow the same steps 
and the result will be very similar. 



4 Numerical calculations 

In this section, we apply the efficient technique introduced in this work to a series 
of polyalanine molecules in order to calculate the Lagrange multipliers when bond 
length constraints are imposed. We also compare our method, both in terms of 
accuracy and numerical efficiency, to the traditional inversion of the matrix R 
without taking into account its banded structure. 

We used the code Avogadro [j33l to build polyalanine chains of A^ies =2, 5, 12, 
20, 30, 40, 50, 60, 80, 90 and 100 residues, and we chose their initial conformation 
to be approximately an alpha helix, i.e., with the values of the Ramachandran 
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Figure 3.9: Scheme of the matrix R for a DNA molecule with Nj^ nucleotides. In 
black, we represent the potentially non-zero entries, and each large block in the 
diagonal is given by (|3.15[). 



angles in the backbone = -60" and i/r = -40" [HI. Next, for each of these 
chains, we used the molecular dynamics package AMBER [|34ll to produce the 
atoms positions (x), velocities (v) and external forces (F) needed to calculate the 
Lagrange multipliers (see sec. |2]) after a short equilibration molecular dynamics 
simulations. We chose to constrain all bond lengths, but our method is equally 
valid for any other choice, as the more common constraining only of bonds that 
involve hydrogens. 

In order to produce reasonable final conformations, we repeated the following 
process for each of the chains: 

• Solvation with explicit water molecules. 

• Minimization of the solvent positions holding the polypeptide chain fixed 
(3,000 steps). 

• Minimization of all atoms positions (3,000 steps). 

• Thermalization: changing the temperature from K to 300 K during 10,000 
molecular dynamics steps. 

• Stabilization: 20,000 molecular dynamics steps at a constant temperature of 
300 K. 

• Measurement of x, v and F. 
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Neutralization is not necessary, because our polyalanine chains are themselves 
neutral. In all calculations we used the force field described in [|35ll . chose a cutoff 
for Coulomb interactions of 10 A and a time step equal to 0.002 ps, and impose 
constraints on all bond lengths as mentioned. In the thermostated steps, we used 
Langevin dynamics with a collision frequency of 1 ps"^. 

Using the information obtained and the indexing of the constraints described 
in this work, we constructed the matrix R and the vector o and proceeded to find 
the Lagrange multipliers using eq. ( |2.6[ ). Since ( |2.6| ) is a linear problem, one 
straightforward way to solve is to use traditional Gauss-Jordan elimination or LU 
factorization IfTTl [36l . But these methods have a drawback: they scale with the 
cube of the size of the system. I.e., if we imposed Nc constraints on our sys- 




0.00001 '-^ • ' • ' 

2 5 12 30 50 100 




2 12 20 30 40 50 60 70 80 90 100 



Figure 4.1: Comparison of a) numerical complexity and b) accuracy between a 
traditional Gauss-Jordan solver (solid line) and the banded algorithm described in 
this work (dashed line), for the calculation of the Lagrange multipliers on a series 
of polyalanine chains as a function of their number of residues A^res- 
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Table 1 : Comparison of numerical complexity and accuracy between a traditional 
Gauss-Jordan solver and the banded algorithm described in this work, for the cal- 
culation of the Lagrange multipliers on a series of polyalanine chains as a function 
of their number of residues A^res- 



tem (and therefore we needed to obtain A^^. Lagrange multipliers), the number of 
floating point operations that these methods would require is proportional to A^^. 
However, as we showed in the previous sections, the fact that many biological 
molecules, and proteins in particular, are essentially linear, allows to index the 
constraints in such a way that the matrix R in eq. ( 2.6^ is banded and use differ- 



ent techniques for solving the problem which require only 0(Nc) floating point 
operations [18J. 

In fig. 4.1 and table |4} we compare both the accuracy and the execution time 



of the two different methods: Gauss-Jordan elimination [13611 . and the banded re- 
cursive solution advocated here and made possible by the appropriate indexing 
of the constraints. The calculations have been run on a Mac OS X laptop with 
a 2.26 GHz Intel Core 2 Duo processor, and the errors were measured using the 
normalized deviation of RA from -o. I.e., if we denote by A the solution provided 
by the numerical method. 

Error := . (4.1) 

From the obtained results, we can see that both methods produce an error 
which is very small (close to machine precision), being the accuracy of the banded 
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algorithm advocated in this work slightly higher. Regarding the computational 
cost, as expected, the Gauss-Jordan method presents an effort that approximately 
scales with the cube of the number of constraints Nc (which is approximately pro- 
portional to Nres), while the banded technique allowed by the particular structure 
of the matrix R follows a rather accurate lineal scaling. Although it is typical that, 
when two such different behaviours meet, there exists a range of system sizes for 
which the method that scales more rapidly is faster and then, at a given system 
size, a crossover takes place and the slower scaling method becomes more effi- 
cient from there on, in this case, and according to the results obtained, the banded 
technique is less time-consuming for all the explored molecules, and the crossover 
should exist at a very small system size (if it exists at all). This is very relevant 
for any potential uses of the methods introduced in this work. 

5 Conclusions 

We have shown that, if we are dealing with typical biological polymers, whose 
covalent connectivity is that of essentially linear objects, the Lagrange multipli- 
ers that need to be computed when A^^. constraints are imposed on their internal 
degrees of freedom (such as bond lengths, bond angles, etc.) can be obtained in 
0{Ni^) steps as long as the constraints are indexed in a convenient way and banded 
algorithms are used to solve the associated linear system of equations. This path 
has been traditionally regarded as too costly in the literature [ 19]i24J, and, there- 
fore, our showing that it can be implemented efficiently could have profound im- 
plications in the design of future molecular dynamics algorithms. 

Since the field of imposition of constraints in moleculary dynamics simula- 
tions is dominated by methods that cleverly achieve that the system exactly stays 
on the constrained subspace as the simulation proceeds by not calculating the 
exact Lagrange multipliers, but a modification of them instead [fT9l [37ll . we are 
aware that the application of the new techniques introduced here is not a direct 
one. However, we are confident that the low cost of the new method and its close 
relationship with the problem of constrained dynamics could prompt many ad- 
vances, some of which are already being pursued in our group. Among the most 
promising lines, we can mention a possible improvement of the SHAKE method 
[[T9ll by the use of the exact Lagrange multipliers as a guess for the iterative proce- 
dure that constitutes its most common implementation. Also, we are studying the 
possibility of solving the linear problems that appear either in a different imple- 
mentation of SHAKE (mentioned in the original work too [19]) or in the LINCS 
method ^31\, and which are defined by matrices which are different from but re- 
lated to the matrix R introduced in this work, being also banded if an appropriate 
indexing of the constraints is used. Finally, we are exploring an extension of the 
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ideas introduced here to the calculation not only of the Lagrange multipliers but 
also of their time derivatives, to be used in higher order integrators than Verlet. 
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